A Survey of Predictive Modelling under Imbalanced Distributions
نویسندگان
چکیده
Many real world data mining applications involve obtaining predictive models using data sets with strongly imbalanced distributions of the target variable. Frequently, the least common values of this target variable are associated with events that are highly relevant for end users (e.g. fraud detection, unusual returns on stock markets, anticipation of catastrophes, etc.). Moreover, the events may have different costs and benefits, which when associated with the rarity of some of them on the available training data creates serious problems to predictive modelling techniques. This paper presents a survey of existing techniques for handling these important applications of predictive analytics. Although most of the existing work addresses classification tasks (nominal target variables), we also describe methods designed to handle similar problems within regression tasks (numeric target variables). In this survey we discuss the main challenges raised by imbalanced distributions, describe the main approaches to these problems, propose a taxonomy of these methods and refer to some related problems within predictive modelling.
منابع مشابه
Predicting Customer Online Shopping Adoption - an Evaluation of Data Mining and Market Modelling Approaches
Accurate prediction of shopping channel preferences has become an important issue for retailers seeking to maximize customer loyalty. In data mining, novel approaches such as neural networks (NN) have been proposed to predict the probability of class memberships in addition to statistical methods from marketing modelling. However, Data Mining suggests new approaches to data preprocessing in ord...
متن کاملComparing Discriminant Analysis, Ecological Niche Factor Analysis and Logistic Regression Methods for Geographic Distribution Modelling of Eurotia ceratoides (L.) C. A. Mey
Eurotia ceratoides (L.) C. A. Mey is an important plant species in semi-arid landsin Iran. New approaches are required to determine the distribution of this plant species. Forthis reason, geographical distributions of Eurotia ceratoides were assessed using threedifferent models including: Multiple Discriminant Analysis (MDA), Ecological Niche FactorAnalysis (ENFA) and Logistic Regression (LR). ...
متن کاملModel Predictive Control of Distributed Energy Resources with Predictive Set-Points for Grid-Connected Operation
This paper proposes an MPC - based (model predictive control) scheme to control active and reactive powers of DERs (distributed energy resources) in a grid - connected mode (either through a bus with its associated loads as a PCC (point of common coupling) or an MG (micro - grid)). DER may be a DG (distributed generation) or an ESS (energy storage system). In the proposed scheme, the set - poin...
متن کاملPolichotomies on Imbalanced Domains by One-per-Class Compensated Reconstruction Rule
A key issue in machine learning is the ability to cope with recognition problems where one or more classes are under-represented with respect to the others. Indeed, traditional algorithms fail under class imbalanced distribution resulting in low predictive accuracy over the minority classes. While large literature exists on binary imbalanced tasks, few researches exist for multiclass learning. ...
متن کاملA Prediction for Classification of Highly Imbalanced Medical Dataset Using Databoost.IM with SVM
Recently, Class imbalance problems have growing interest because of their classification difficulty caused by the imbalanced class distributions. In particular, many ensemble learning and machine learning methods have been proposed for classification of imbalance problem. However, these methods producing poor predictive accuracy of classification for two-class imbalanced dataset. In this paper,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1505.01658 شماره
صفحات -
تاریخ انتشار 2015